Skip to content

opentelemetry-sdk: drop NaN measurements at instrument level to prevent aggregation poisoning#5336

Open
RudraDudhat2509 wants to merge 4 commits into
open-telemetry:mainfrom
RudraDudhat2509:fix/drop-nan-measurements
Open

opentelemetry-sdk: drop NaN measurements at instrument level to prevent aggregation poisoning#5336
RudraDudhat2509 wants to merge 4 commits into
open-telemetry:mainfrom
RudraDudhat2509:fix/drop-nan-measurements

Conversation

@RudraDudhat2509

@RudraDudhat2509 RudraDudhat2509 commented Jun 19, 2026

Copy link
Copy Markdown

Summary

  • NaN measurements passed to metric instruments (e.g. counter.add(float("nan"))) were silently folded into running aggregations, permanently poisoning the sum, all subsequent exports emitted nan with no recovery path
  • Root cause: the existing if amount < 0 guard on Counter and Histogram is bypassed by NaN because all IEEE 754 comparisons against NaN return False
  • Fix: add a math.isnan() check at the instrument level (same layer as the existing negative-value guard) for all sync instruments (Counter, UpDownCounter, Histogram, Gauge) and in the async callback path, matching the approach taken in the Java SDK (open-telemetry/opentelemetry-java#5846, fixed in PR #5859)

Closes #5330

What was tested

  • Added test_add_nan to TestCounter and TestUpDownCounter
  • Added test_record_nan to TestHistogram
  • Added test_set_nan to TestGauge
  • Added test_nan_callback_value_is_dropped to TestObservableGauge (verifies NaN from async callbacks is skipped, valid measurements still pass through)
  • All 30 instrument tests pass (python -m pytest opentelemetry-sdk/tests/metrics/test_instrument.py)
  • Verified end-to-end: counter.add(1) → add(NaN) → add(5) now yields sum = 6, not nan

@RudraDudhat2509 RudraDudhat2509 requested a review from a team as a code owner June 19, 2026 16:15
@RudraDudhat2509

Copy link
Copy Markdown
Author

here is a ss of the fix for the teams reference

image

Comment thread .changelog/5336.fixed Outdated
for callback in self._callbacks:
try:
for api_measurement in callback(callback_options):
if math.isnan(api_measurement.value):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not catch math.inf though which I guess will have the same issue as math.nan, what about using math.isfinite to cover both instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch sir, completely missed the inf check, apologies, will update soon

@github-project-automation github-project-automation Bot moved this to Reviewed PRs that need fixes in Python PR digest Jun 22, 2026
@RudraDudhat2509

Copy link
Copy Markdown
Author

hey sir, switched all isnan checks to math.isfinite to cover both NaN and Inf, and added inf test cases for each instrument type

@RudraDudhat2509 RudraDudhat2509 requested a review from xrmx June 22, 2026 13:59
@xrmx xrmx moved this from Reviewed PRs that need fixes to Ready for review in Python PR digest Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for review

Development

Successfully merging this pull request may close these issues.

opentelemetry-sdk: NaN measurements permanently poison metric aggregations

3 participants